% !TeX spellcheck = en_US
\input{preamble.tex}
\title{Lot Sizing under Distribution-free Random Yield}
\date{\today}
\author{Reza Skandari, Mahesh Nagarajan}
\begin{document}
\maketitle
\bibliographystyle{authordate3}
\section{Introduction}
Consider a firm producing a product facing an external demand. The product has a random yield whose distribution is affected by the production lot size. Since the yield is random, there can be overage or underage after yield and demand are realized. The firm wants to minimize its expected cost which may include production, shortage, and holding costs. This problem is well-studied under the assumption that the distributions of demand and yield are known (see \cite{Henig1990}). 

In this paper, we relax the assumption that the distribution of yield is known. This is topical when the product is new and the firm has no or little prior information about the production yield of the new product. One of the strategies firms use in such situations is to take samples from the unknown variable and use the empirical distribution instead of the true distribution. In the case of random yields, a firm can experiment on the yield of the new product in a small scale before committing to a large scale lot size. This strategy works in situations in which the information gathered in the small scale production is applicable to the real, larger scale. An example of such case is when the yield is \textit{stochastically proportional} to input.

One natural question to ask then is the performance of the policy found using the empirical distributions (instead of the true and unknown distribution). The performance of such policy depends on many factors such as the cost structure, the true distribution shape, and sample size. Unlike most other factors, sample size can be controlled by the decision maker and thus one can seek some performance guarantee by taking ``enough'' samples.

In this paper, we investigate a special case of this problem and try to determine a minimum sample size that with high confidence guarantees a relative performance of the policy found using the empirical distribution (to that of the  optimal policy). We extend the result of \cite{Levi2009} to the lot sizing problem in the presence of random yield.
\section{Model Setting}
We assume a single-period problem in which a firm is facing a known and deterministic demand. This can be the case when a firm commits to provide an amount of the product to another firm or when it wants to meet a point forecast. The random yield is stochastically proportional with an unknown distribution. We assume a linear overage and underage cost. The following notations are used in defining a mathematical model for this problem:

\begin{itemize}
\item $d$: demand (given deterministically)
\item $I_0$: initial inventory
\item $h$, $b$: per unit overage and underage cost (strictly positive)
\item $U$: random yield proportion (whose distribution is unknown a priori)
\item $q$: lot size
\item $C(q)$: the expected cost for the lot size $q$
\end{itemize}
We assume that  $U \ge 0$ with probability one. Based on the problem definition, we want to minimize the cost function $C(q;d)$ defined as follows:
\begin{align} \label{eq:cq}
C(q;d)= \Ex_{U} [h(qU+I_0-d)^+ +b(d-qU-I_0)^+],
\end{align}
where $x^+=\max(x,0)$.

The following two properties simplify our model:
\begin{enumerate}
\item Without loss of generality, we can assume $I_0=0$. We can show this property as follows. Let $d'=d-I_0$. Now, if $d'=0$, then we can easily show that $q^*=0$. If $d' < 0$, then we can show that again  $q^*=0$ using the fact that $U \ge 0$ with probability one. Otherwise, we can take $d'>0$ as the net demand and assume no initial inventory (See Equation \ref{eq:cq}).
\item Without loss of generality, we can assume $d=1$. We can prove that the optimal lot size is proportional to $d$, i.e., $\dfrac{q^*(d)}{d}$ is constant. Let $r=\dfrac{q}{d}$. Then, we have:
\begin{align*} 
C(q;d)= \Ex_{U} [h(rdU-d)^+ +b(d-rdU)^+]=d\Ex_{U} [h(rU-1)^+ +b(1-rU)^+]=dC(r;1).
\end{align*}
Let $r^*$ be the optimal lot size for $d=1$, i.e., the minimizer of $C(r;1)$. Then, we have that $C^*(q;d)=dC^*(r;1)$ and $q^*(d)=r^*d$.
\end{enumerate}
The second property has also another significance. Based on property two, the sampling and experimentation on the unknown random variable can be done on a scale that is smaller than the real problem and thus cheaper than when we work with the real world scale (for instance the setting of \cite{Levi2009}). Note that any guarantee on the performance ratio established for $d=1$ carries over to the case when $d$ is arbitrary (since $C(q;d)$ is linear in $d$). Therefore, we assume that $d=1$ and use $C(r)$ instead of $C(r;1)$ for ease of notation.

\section{The Optimal Solution}
In this section, our goal is to find a lot size $r$ that minimizes the cost function $C(r)$ defined below:
\begin{equation*} 
C(r)= \Ex_{U} [h(rU-1)^+ +b(1-rU)^+],
\end{equation*}
where the expectation is taken with respect to the random yield $U$ with a known distribution.

We will use the sample average approximation method to solve the problem when we do not have access to the true distribution of $U$, and thus, we show some properties of the optimal solution for the case in which $U$ is a discrete random variable.

We can easily show that $C(r)$ is a convex function. We derive explicit right-hand and left-hand derivatives of $C$ in order to show the properties of the optimal solution. By interchanging the order of integration (expectation with respect to $U$) and the limit (derivatives with respect to $r$), we can easily derive one-handed derivatives. Note that such interchange can be justified using a standard dominated convergence theorem.

Let $C^l(r)$ and $C^r(r)$ be the left-hand and right-hand derivatives of $C$ at $r$, respectively. Let $C(r,u)$ denote the cost when the random yield equals $u$.  Then, we have:
\begin{equation*}
C^l(r,u)=\begin{cases} hu:& ur > 1 \\
-bu:& ur \le 1 \\
\end{cases}
\end{equation*}
Similarly, 
\begin{equation*}
C^r(r,u)=\begin{cases} hu:& ur \ge 1 \\
-bu:& ur < 1 \\
\end{cases}
\end{equation*}
Let $\mathbf{p}$ be the probability measure of the random variable $U$. By taking expectation from $C^l(r,u)$ with respect to $U$, we obtain:
\begin{align*} %\label{eq:leftder}
C^l(r)=h \int_{ru>1} u d \mathbf{p}-b \int_{ru \le 1} u d\mathbf{p}=h\Ex [U]-(b+h)\int_{ru \le 1} u d\mathbf{p},
\end{align*}
in which the expectations are Lebesgue integrals with respect to $\mathbf{p}$, the probability measure of the random variable $U$.
Similarly, we have
\begin{align*} % \label{eq:rightder}
C^r(r)=h \int_{ru \ge 1} u d \mathbf{p}-b \int_{ru < 1} u d\mathbf{p} =(b+h) \int_{ru \ge 1} u d \mathbf{p} -b\Ex [U].
\end{align*}

Since the cost function is convex, an optimal solution $r^*$ to $\min_{r\ge 0} C(r)$ satisfies  $C^{r}(r^*) \ge 0 \ge C^{l}(r^*) $, which implies that zero is a subgradient of $C$ at $r^*$.
We choose the smallest solution from the optimal solution set, so that the optimal solution is uniquely defined. In other words, we assume 
\begin{align} \label{eq: optratio}
r^*=\inf \{r: C^{r}(r) \ge 0\}.
\end{align}
\section{Sample Average Approximation}
Assume we have $n$ independent samples from $U$, denoted by $u_i: i=1,\ldots, n$. With respect to the empirical distribution induced by the sample, SAA minimizes the approximate cost function $\widehat{C}(r)$ defined below:
\begin{align*}
\widehat{C}(r)= \frac{1}{n} \sum\limits_{i=1}^{n} \big[ h(ru_i-1)^+ +b(1-ru_i)^+ \big]
\end{align*}
Let $\widehat{R}$ be a solution that minimizes the SAA counterpart. Using Equation \ref{eq: optratio}, we can prove that $\widehat{r}$, a realization of $\widehat{R}$ with respect to $u_i$'s, satisfies the following
\begin{align}\label{eq:rhatdef}
\widehat{r}=\begin{cases} 
\bigg[ \max \bigg \{ u_j:  (b+h) \sum\limits_{u_i <u_j} u_i \le h \sum\limits_{i=1}^{n} u_i  \bigg\}\bigg]^{-1} & \text{if } \exists u_i > 0\\
0 &\text{o.w.}
\end{cases}.
\end{align}
Note that $\widehat{r}$ defined above is well-defined because for the case in which $\exists u_i > 0$, we have:
$$\max \bigg \{ u_j:  (b+h) \sum\limits_{u_i <u_j} u_i \le h \sum\limits_{i=1}^{n} u_i  \bigg\} \ge  \text{second smallest } u_i>0.$$

\section{Relative error guarantee}


In this section, we explore the relationship between the number of random yield samples, the relative performance of the policy found using SAA with respect to the samples taken, and the confidence level of such performance, under the following assumptions about the random variable $U$:

\begin{ass} [Upper bound  on $U$] \label{ass:ubound}
The support of $U$ has a known upper bound denoted by $M$, i.e., with probability 1, we have $U \le M$.
\end{ass}

\begin{ass} [Lower bound on $\Ex U$] \label{ass:lbound}
The expected yield has a known positive lower bound denoted by $m$, i.e., $0<m \le \Ex U$.
\end{ass}

We want to determine a function $N(\epsilon,\delta, h,b,m,M)$ such that if the number of samples $n\ge N$, then the optimal solution to the SAA counterpart defined on $n$ samples has expected cost $C(\widehat{R})$ that is at most $(1+\epsilon)C(r^*)$,  with probability at least $1-\delta$.


We prove the result in two steps. We first establish a connection between subgradients of $C(r)$ at a point and bounds on the relative error of the objective function. In the next step, we determine the number of samples which guarantees, with some confidence, the required bound on the norms of subgradient of $C(r)$ and as a result, guarantees the relative error of the objective function. Finally, we combine these two results to find the number of samples that guarantees the specified relative error of the solution found by SAA with the specified confidence.

Similar to \cite{Levi2009}, we define closeness of solutions to an optimal solution in terms of norms of subgradient of a point. The following definition provides a precise definition of such closeness:
\begin{defn}[$\alpha$-accurate] 
Let $f:\RR \to \RR$. A point $r$ in $\RR$ is $\alpha$-accurate for $f$ if there exist a subgradient $v$ of $f$ at $r$ with absolute value less than or equal to $\alpha$, i.e., $\exists v \in \partial f(r)$ with $|v| \le \alpha$.
\end{defn}
Equivalently, a point is $\alpha$-accurate if and only if  $f^r(r) \ge -\alpha$ and $f^l(r) \le \alpha$.


\begin{lem}\label{lem:lowerb}
Let $r$ be $\alpha$-accurate for $C$. Then, we have:
\begin{itemize}
\item [(i)] $C(r)-C(r^*) \le \alpha |r-r^*|$
\item [(ii)] $C(y^*)  \ge \dfrac{hb\Ex U-\max (b,h)\alpha}{(b+h)}|r^*-r| $
\end{itemize}
\end{lem}
\begin{proof} Suppose $r$ is $\alpha$-accurate. Either we have $r \le r^*$ or $r > r^*$. 

{\boldmath \textbf{Case} $r \le r^*$}: If the realized yield $u$ is such that $ru \ge 1$, then $r^*$ incurs higher cost than $r$ by exactly $hu(r^*-r)$. On the other hand, if $u$ is such that $r^*u \le 1$, then  $r$ incurs higher cost than $r^*$ by exactly $bu(r^*-r)$. Finally for values of $u$ such that $r^*u > 1>ru$, we have $C(r,u)-C(r^*,u) \le C(r,u)=b(1-ru) \le bu(r^*-r)$. The first inequality follows from the fact that $C(r^*,u) \ge 0$, and the last inequality follows from assuming $r^*u > 1$.

Therefore, we have $C(r,u)-C(r^*,u)$ is smaller than or equal to $hu(r^*-r)$ and $bu(r^*-r)$ for $ru \ge 1$ and $ru < 1$, respectively. Taking expectation with respect to $U$, we have:
\begin{align}\label{eq:delta1}
C(r)-C(r^*) \le (r^*-r) \bigg[-h\int_{ur \ge 1}ud\mathbf{p} + b \int_{ur < 1}ud\mathbf{p} \bigg] =-(r^*-r)C^r(r) \le \alpha (r^*-r),
\end{align}
where the equality follows from the definition of $C^r(r)$, and the second inequality follows from the fact that $r$ is $\alpha$-accurate and therefore $ C^r(r) \ge -\alpha$.

Based on the above arguments, $r^*$ incurs at least $ \Ex \big[h(r^*-r) u \ind{ur \ge 1}\big]$, i.e., we have:
\begin{align} \label{eq:lower1}
C(r^*) \ge h(r^*-r) \int_{ur \ge 1}ud\mathbf{p} \ge (r^*-r) \frac{h b\Ex U-h\alpha}{(b+h)},
\end{align}
where the second inequality follows from the fact that $C^r(r) \ge -\alpha$ and the definition of $C^r(r)$.

{\boldmath \textbf{Case} $r > r^*$}: Similar to the case $r \le r^*$, for realizations of $u$ such that $ru \le 1$, $r^*$ incurs higher cost than $r$ by exactly $bu(r-r^*)$. For other realizations, the difference between the cost incurred by $r$ and $r^*$ is at most $hu(r-r^*)$. Therefore,
\begin{align} \label{eq:delta2}
C(r)-C(r^*) \le (r-r^*) \bigg[-b\int_{ur \le 1}ud\mathbf{p} + h \int_{ur > 1}ud\mathbf{p} \bigg] =(r-r^*)C^r(l) \le \alpha (r-r^*),
\end{align}
where the equality follows from the definition of $C^l(r)$, and the second inequality follows from the fact that $r$ is $\alpha$-accurate and therefore $ C^l(r) \le \alpha$.



Also, we have that $r^*$ incurs at least  $ \Ex \big[b(r-r^*) u \ind{ur \le 1}\big]$, i.e., we have
\begin{align} \label{eq:lower2}
C(r^*) \ge b(r-r^*) \int_{ur \le 1}u d\mathbf{p} \ge (r^*-r) \frac{h b\Ex U-b\alpha}{(b+h)},
\end{align}
where the second inequality follows from the fact that $C^l(r) \le \alpha$ and the definition of $C^l(r)$.

Combining inequalities \ref{eq:delta1} and \ref{eq:delta2}, we obtain $C(r)-C(r^*) \le \alpha |r^*-r|$ and thus part (i). Combining inequalities \ref{eq:lower2} and \ref{eq:lower1}, we obtain:
\begin{align*} 
C(r^*) \ge   \frac{h b\Ex U-\max \{b,h\}\alpha}{(b+h)}|r^*-r|,
\end{align*}
and thus part (ii).
\end{proof}


\begin{corollary} \label{cor:cor1}
For a given accuracy level $\epsilon \in (0,1]$, if $r$ is $\alpha$-accurate for $\alpha=\frac{1}{3} \epsilon m\min(b,h)$, then the cost of $r$ is at most (1+$\epsilon$) times the optimal cost, i.e., $C(r) \le (1+\epsilon) C(r^*)$.
\end{corollary}
\begin{proof}
Based on the results of Lemma \ref{lem:lowerb}, it is sufficient to have that $$\alpha |r-r^*| \le \epsilon \frac{hb\Ex U-\max (b,h)\alpha}{(b+h)}|r^*-r|,$$
or equivalently
 $$\alpha (b+h+\max(b,h)\epsilon) \le \epsilon hb\Ex U.$$
We have:
\begin{align*}
\alpha (b+h+\min(h,b)\epsilon) \le \alpha (b+h+\min(h,b)) \le 3\alpha \max (b,h) = \min(h,b) \epsilon m \max(h,b)  \le \epsilon hb\Ex U,
\end{align*}
where the first inequality follows from $\epsilon \le 1$, the equality is obtained by substituting $\alpha$, and the last inequality follows the fact $m \le \Ex U$ by Assumption \ref{ass:lbound}.
\end{proof}

\begin{lem} \label{lem:alphan}
For each $\alpha>0$ and $0 <\delta<1$, if the number of samples is $n \ge \dfrac{\log (\frac{4}{\delta})}{2\alpha^2} [M(b+h)]^2$, then $\widehat{R}$, the solution to the SAA counterpart, is $\alpha$-accurate with probability $1-\delta$. 
\end{lem}
\begin{proof} Fix $r$ arbitrarily. For $i=1,\ldots,n$ define random variables $X_i=U_i\big[h -(b+h) \ind{rU_i  \le 1}\big]$ and $Y_i=U_i\big[h -(b+h) \ind{rU_i  < 1}\big]$. Note that for any $i$, we have $\Ex X_i =C^l({r})$ and $\Ex Y_i =C^r({r})$. Therefore, sample means for $X_i$ and $Y_i$, denoted by $\overline{X}$ and $\overline{Y}$, are unbiased estimators for $C^l({r})$ and $C^l({r})$, respectively.

Note that $X_i$ and $Y_i$ are i.i.d. (since $U_i$'s are assumed to be taken independently from one another). By Assumption \ref{ass:ubound}  we have $U_i \le M$ with probability 1, and as a result $-bM \le X_i, Y_i \le hM$.

We have that for any $\alpha > 0 $:
\begin{align}\label{eq:lefthalf}
\prr {C^l(r)-\overline{X} \ge \alpha }=\prr {\overline{X} - \Ex \overline{X} \ge \alpha } \le \prr {|\overline{X} - \Ex \overline{X} |\ge \alpha }&\le 2 \exp \big(-\frac{2n\alpha^2}{[M(b+h)]^2}\big)\le \frac{1}{2}\delta,
\end{align}
where the equality follows from $ C^l(r)=\Ex \overline{X} $, and the second inequality is a direct result of Hoeffding's inequality \citep{Hoeff1963}. 
Similarly, we can show that $\prr { \overline{Y} -C^r(r)  \ge \alpha } \le \frac{1}{2}\delta$.

Since we chose $r$ arbitrarily, we have that these bounds are valid for $r=\widehat{r}$ as well. Now, let $r=\widehat{r}$. Based on the way we defined $\widehat{r}$, we have that $\overline{X} \le 0$ and $\overline{Y} \ge 0$ (see Equation \ref{eq:rhatdef}). Therefore, we have 
\begin{align}
\pr [C^l(\widehat{r}) > \alpha ] \le\pr [C^l(\widehat{r}) - \overline{X} \ge  \alpha] \le \frac{1}{2}\delta ,
\end{align}
where the first inequality follows from $\overline{X} \le 0$, and the second inequality follows from Equation \ref{eq:lefthalf}. Similarly, we have $\pr [C^r(\widehat{r}) < -\alpha ]  \le \frac{1}{2}\delta $.

Therefore, the probability that $\widehat{r}$ is not $\alpha$-accurate, i.e.,  
$\prr {(C^r(\widehat{r}) < -\alpha)   \text{ or }  (C^l(\widehat{r}) > \alpha)}$ is at most $\frac{1}{2}\delta+\frac{1}{2}\delta=\delta$. Thus, we have that with probability $1-\delta$, $\widehat{R}$ is  $\alpha$-accurate.
\end{proof}

\begin{thm}
Let $0<\epsilon \le 1$ be a specified accuracy level and $1-\delta$ be the confidence level. Suppose that $n \ge \log(\frac{4}{\delta}) \frac{9}{2 \epsilon^2} (\frac{M}{m} \frac{b+h}{ \min(h,b)})^2$ and $\widehat{R}$ was obtained by solving the SAA counterpart using $n$ i.i.d. samples of $U$. Then, with probability at least $1-\delta$, we have $C(\widehat{R}) \le (1+\epsilon) C(r^*)$.
\end{thm}
\begin{proof}
Let $\alpha=\frac{1}{3} \epsilon m\min(b,h)$. By Lemma \ref{lem:alphan}, $\widehat{R}$ is $\alpha$-accurate with probability $1-\delta$. By Corollary  \ref{cor:cor1}, if $\widehat{R}$ is $\alpha$-accurate, we have $C(\widehat{R}) \le (1+\epsilon) C(r^*)$. Therefore, with probability at least $1-\delta$, we have $C(\widehat{R}) \le (1+\epsilon) C(r^*)$.
\end{proof}
\newpage
\bibliography{refs}
\newpage
\end{document}

, and that $0< U \le M$ for some known $M$.

Note that for any $r$ the one-handed derivatives are bounded as we have $$-b\Ex [U] \le C^{l,r}(r) \le h\Ex [U].$$
Since we assumed that $0<  U \le M$, we obtain 
$$-bM\le C^{l,r}(r) \le h M.$$
Note that $r^*>0$ since we assumed that $\Ex U >0$.

As we stated in the previous section, $r^* >0$. Therefore, in the rare event when $\widehat{r}=0$, which happens when $u_i=0$ for all $i$, we need to take more samples to ensure $\exists u_i > 0$.

 Combining inequalities \ref{eq:lower1} and \ref{eq:lower2}, we have
\begin{align} \label{eq:lower3}
C(r^*) \ge \min \{b,h\}|r-r^*| \int_{ur \ge 1}ud\mathbf{p}
\end{align}